112

9

Probability and Likelihood

In observing the natural world, one encounters “deterministic” events, character-

ized by rather clear relationships between the quantities measured compared with

the experimental uncertainties, and more uncertain events with statistical outcomes

(such as coin tossing or Mendelian gene segregation). The latter raise the general

problem of how to assess the relative merits of alternative hypotheses in the light

of the observed data. Statistics concerns itself with tests of significance and with

estimation (i.e., seeking acceptable values for the parameters of the distributions

specified by the hypotheses).

The method of support proposes that

StartLayout 1st Row posterior support equals prior support plus experimental support EndLayoutposterior support = prior support + experimental support

and

StartLayout 1st Row information gained equals log StartFraction posterior probability Over prior probability EndFraction period EndLayoutinformation gained = log posterior probability

prior probability

.

Two rival approaches to estimation have arisen: the theory of inverse probabil-

ity (due to Laplace), in which the probabilities of causes (i.e., the hypotheses) are

deduced from the frequencies of events, and the method of likelihood (due to Fisher).

In the theory of inverse probability, these probabilities are interpreted as quantitative

and absolute measures of belief. Although it still has its adherents, the system of

inference based on inverse probability suffers from the weakness of supposing that

hypotheses are selected from a continuum of infinitely many hypotheses. The prior

probabilities have to be invented; for example, by imagining a chance setup, in which

case the model is a private one and violates the principle of public demonstrability.

Alternatively, one can apply Laplace’s “Principle of Insufficient Reason”, according

to which each hypothesis is given the same probability if there are no grounds to

believe otherwise. Conceptually, that viewpoint is rather hard to accept. Moreover, if

there are infinitely many equiprobable hypotheses, then each one has an infinitesimal

probability of being correct.

Bayes’ theorem (9.18) may be applied to the weighting of hypotheses if and

only if the model adopted includes a chance setup for the generation of hypotheses

with specific prior probabilities. Without that, the method becomes one of inverse

probability. Equation (9.18) is interpreted as equating the posterior probability of the

hypothesisupper E Subscript kEk (after having acquired dataupper AA) to our prior estimate of the correctness

ofupper E Subscript kEk (i.e., before any data were acquired),upper P left brace upper E Subscript k Baseline right braceP{Ek}, multiplied by the prior probability

of obtaining the data given the hypothesis (i.e., the likelihood; see below), the product

being normalized by dividing by the sum over all hypotheses.

A fundamental critique of Bayesian methods is that the Bayes–Laplace approach

regards hypotheses as being drawn at random from a population of hypotheses, a

certain proportion of which is true. “Bayesians” regard it as a strength that they

can include prior knowledge, or rather prior states of belief, in the estimation of the